Predicting the future of genetic risk prediction.

نویسندگان

  • Nilanjan Chatterjee
  • Ju-Hyun Park
  • Neil Caporaso
  • Mitchell H Gail
چکیده

Recent genome-wide association studies (GWAS) have identified many susceptibility loci for a variety of complex traits. There is intense interest in evaluating the potential utility of these variants for individualized risk prediction for chronic diseases such as breast cancer. A recent article in this journal (1) evaluated the potential discriminatory accuracy of breast cancer risk models that use a variety of genetic variants, some of which, but not all, were found through GWAS. We comment on the methodology and conclusions of that paper and compare it with other work that has assessed the potential role of genetic variants for risk prediction and considered the possible impact of discoveries of new variants in future studies. The paper by van Zitteren et al. (1) used simulations to estimate the area under the operating characteristic curve (AUC) from breast cancer risk models based on single nucleotide polymorphisms (SNPs). From a review of meta-analyses and GWAS, they found 96 variants, 41 of which were nominally significant at the 0.05 level. They estimated the AUC as 0.67 based on these 41 variants and 0.68 based on all 96 variants. They also commented on the numbers of additional variants that would be needed to increase the AUC to a higher value, such as 0.80. The work of van Zitteren et al. is interesting and surprising in several respects. Previous studies based on SNPs associated with breast cancer in GWAS yielded estimates of AUC of 0.574 for 7 SNPs (2), 0.579 for 10 SNPs, (3) and 0.585 for 11 SNPs (4). The SNPs in these studies attained genome-wide significance (P < 10 ) and odds ratios per allele were based on independent data, thus reducing or eliminating the bias found in odds ratios in the discovery phase of GWAS, known as the "winners’s curse" (5). It is very difficult to increase the AUC from 0.58 to 0.67 (2, 6). Park et al. (7) examined the prospects for increasing the AUC based on further GWAS. They estimated the likely number and the distribution of effect sizes for yet to be discovered SNPs based on the effect sizes for known susceptibility loci and the power of the original GWAS that led to these discoveries. For breast cancer, they estimated that within the range of effect sizes of known loci, there could be a total of 67 common susceptibility SNPs, including those already identified, which is almost 4 times the number of known SNPs. However, the incremental contribution of the undetected SNPs to the genetic variance is likely to be modest as their effect sizes will tend to concentrate toward the lower end of the range. In particular, Park et al. estimated that the total of 67 susceptibility SNPs could explain a total of 17.1% of genetic variance of breast cancer and could lead to an AUC of only 0.635. Because detection of most of these additional SNPs at a genomewide significance would require very large sample sizes, they concluded that an AUC of 0.635 would be the practical upper limit of the discriminatory power of risk model for breast cancer on the basis of common susceptibility SNPs alone. Thus, it is impressive that van Zitteren et al. report an AUC of 0.67 based on 41 nominally significant variants. On the basis of a log-normal approximation of risk (8), we estimated that to achieve an AUC of 0.67, a genetic risk model will need to explain 27.9% of known heritability of breast cancer corresponding to a sibling recurrence risk of 2.0. In contrast, the 18 SNPs found in GWAS studies in Table 1 account for only 7.1% of the heritability. There has been recent speculation (9) on reasonswhy the SNPs from GWAS account for such a small portion of the theoretical heritability that one would anticipate from studies of familial aggregation of breast cancer (8, 10). Some possibilities are variants that contribute to risk but are not tagged by common SNPs in standard GWAS analyses. Such variants might include deletions, repeats, copy number variations, and uncommon and rare SNPs that are not in tight linkage disequilibrium with SNPs accessible with a GWAS platform, such as the Illumina Infinium 660K array. Thus, it is possible that van Zitteren et al. have tapped into some of the "missing heritability" to achieve an AUC of 0.67 from nominally significant variants on the basis of meta-analytic literature; some of these variants are deletions and repeats that might not be detected in GWAS. We will return to this possibility in discussing Table 1, which classifies the variants in van Zitteren et al. partly in terms of the ability of GWAS to detect them. Another possibility is that some of the variants used by van Zitteren et al. are not truly associated with breast cancer and represent chance findings with possibly exaggerated odds ratios. Many of the meta-analyses cited by van Zitteren et al. focused on a "candidate gene." Findings from many early candidate gene studies were not reproducible, and a number of authors sought to explain why (11, 12). Important reasons for these failures were the Authors' Affiliation: Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Rockville, Maryland

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bankruptcy Prediction: Dynamic Geometric Genetic Programming (DGGP) Approach

 In this paper, a new Dynamic Geometric Genetic Programming (DGGP) technique is applied to empirical analysis of financial ratios and bankruptcy prediction. Financial ratios are indeed desirable for prediction of corporate bankruptcy and identification of firms’ impending failure for investors, creditors, borrowing firms, and governments. By the time, several methods have been attempted in...

متن کامل

Prediction of Driver’s Accelerating Behavior in the Stop and Go Maneuvers Using Genetic Algorithm-Artificial Neural Network Hybrid Intelligence

Research on vehicle longitudinal control with a stop and go system is presently one of the most important topics in the field of intelligent transportation systems. The purpose of stop and go systems is to assist drivers for repeatedly accelerate and stop their vehicles in traffic jams. This system can improve the driving comfort, safety and reduce the danger of collisions and fuel consumption....

متن کامل

Optimal Portfolio Allocation based on two Novel Risk Measures and Genetic Algorithm

The problem of optimal portfolio selection has attracted a great attention in the finance and optimization field. The future stock price should be predicted in an acceptable precision, and a suitable model and criterion for risk and the expected return of the stock portfolio should be proposed in order to solve the optimization problem. In this paper, two new criterions for the risk of stock pr...

متن کامل

Statistical Prediction of Probable Seismic Hazard Zonation of Iran Using Self-organized Artificial Intelligence Model

The Iranian plateau has been known as one of the most seismically active regions of the world, and it frequently suffers destructive and catastrophic earthquakes that cause heavy loss of human life and widespread damage. Earthquakes are regularly felt on all sides of the region. Prediction of the occurrence location of the future earthquakes along with determining the probability percentage can...

متن کامل

پیشگویی گام‌ـ بلند سرعت باد مبتنی بر مدل ترکیبی RNNGA

For proper and efficient utilization of wind power, the prediction of wind speed is very important. Wind is one of the main sources of energy in the world, but the wind turbines have a lack of reliability, continuity and homogeneity in power production. On the other hand, sudden changes of wind speed, lead to risk for wind turbine units health. Therefore, the prediction of wind speed for turbin...

متن کامل

Developing a Dynamic Regression Model for Predicting Future Operating Cash Flow

The purpose of this research is to develop a dynamic regression model for prediction of future operating cash flows of firms accepted in Tehran Stock Exchange. So, the information of 250 companies were considered during 2004 to 2017. In this study, operational and economic variables were added to the fundamental model of Bart, Cram and Nelson (BCN). Due to the simultaneous effect of sales growt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology

دوره 20 1  شماره 

صفحات  -

تاریخ انتشار 2011